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s»mb««bb?w y wtb«b»j#»** u n 

EX*MBB©Btttc»-3^T*-f HbS LS©#-f V 
-f HI/B«*ttiBBlcfet*T. WBIWttB!l#»T«y 
*^33t^i2a$#S<!:, B«§B1BlsBiEg'J#ST*«yai*ft 

WWJ#»WJ BS*lfcX*M«»lc»U -b >4r y 
ifclWterSfcBBBfrS©** h/lsBttttlUBB. 

XBBBfrS©** h;MWK4ffl*atcfi^T. sulBS 
*5WWrt©X*3- K*B«U X*3- KWSyoDB* 
flit tf-£<D L$^fil«±T*i6«6^**'JK-r*S 1 
«DMt, BIBBl©lS?He©L*lM»}l±T** 

m&* 3 ] A7j*nfc»BBfr s**wb«*b 

JBTHffl y HI L, «IK*5>Jfi«©«ttlc»^T* -T h 

jueurctf-rvh treses u *>rMU£ttffi-r* 

*MB*fcfl!»»BB*Bf5U BXWBWMc***! 

©1ST*. »Sfl^jW»r»llrt *<h 

Bi:T*X»BBfrS©*<r HPB«*tti*». 
[»*JS4] A7J*ftfc*»BBfrSX*WB«*« 
»"«B y ffl L. BlKWBBflDBItfcB'^T* <f h 

*»B«fr604"f MUBW4U*S6te*5t % T. M§3£ 
**BWW©**:J- KOBBBJRlcHLTaBVBtt 

a*«fTr*«i ©isi. mbbioisoisb. <* 

*±»fc*oTl^*B«?»*fr* ! NR**»2(DlS 
i:, B5E«2©lST*f*m±»lcft-3Tt^fiiSEtc» 
U HUSL*©#<l'Vh*ifllllU *©£f+<ii(c 

*y*-r^iM»«*a36«»air*B3©Mt. 

fccttWBtTSaBBBfrS©** h/1/BWWB* 



[BSB5] A73*hfc»BB*SX*5UB«*B 
JBWJHIL. BEfcWJBBflMKfefcB-SOT** h 

SBBBfrS©** H;UBW*a*i6fc*^T. b5I3£ 

mznfirzmi<Dxmt. iiBftioxaa&x, * 

<< h;lteBajr*B««)«lt«B»Bt«B*»JB« 

Sfc©*B«teafe**WB«?**fr**IR**B 2 
©IS*. BuSSg2©Ig©BiSEU:WU *fHb5L 
dr«D<K<>h*BBU *<D*tHlte*y*'fM'B« 

[B«B6] X**ftfc*BBBfrS**WB«*B 

SBBBfrS©** hyUBWtajJKSKfi^T. b5I3£ 

wjb*«c»lt7 * v hB&MH«*fTr«B 1 ©i 
a&. be7*>hb««»!)Bbk»^t. *£© 

7 * y V 7, 9 4 /HftM U »B©7 * V h *«t*Tt* 
*X*BBT**fr«W*-*B2CKr«£. iuf3lg2 
©IBT»S©7* V h*B^T^*ft*B«tc»U 
*-f hiUSL*©#'TVt'*lHIIIU ^©SfttllcJ: y 
Hl/B***« • te&?%J£S*)JMt. ZStsC 

[ffi^Ji7 3 A73tfftfc»BBfrS**WB«*« 
JBTHffl y UJ U HB5t*5«JB«OBtticB^T* < h 

*5>jb«k»lt7 * y hm&mmzmft?zm 1 ©x 
m£. m&7*ybm}*m<DUm\cm-3vT. ?*y 

hy5i»«ft«LT**. BJBBBflDiMH^^VhX 
«'f/U«B^T^«5t*m?S«fl k «WKf«B2«> 
iSt. ttEB2©I«TWRLfe**BJilc»L. * 
•f h;U6L*©#-fVh*l)DS[U *©dtHHc*y* 
•f HUBB**3e'»ttr*B3©lB4. tdfeCt 
*»Bi:r*5tBBBt( k 6©*'f l*ibB«»ai*a. 
[B«B8] A**ftfc*BB«*»6X*J>JB«*B 
*WJWU «l3**5>JB«<OBtttc»^T*-f h 

3»B«iT6©*'f h;l4WB4**atcte^T. tMB& 

*nm%m2<j)jM£. m%mni&T°s>%£ym^ 



1 



(3) 



2 0 0 0—1 48788 



&ttfeB<mis wmzmmn) <D$%m*xtt>% 

KT'&Z £31 3 (Dig . bu§BH 3 ©1ST' 

;U«WE*i*;£ • ttajf5lg4©i*l£, 
HHtHIO] MI3*^Hl/SL£4>tf'l'>MQ3l© 

tt3 C t fcftSti: f 2&CL9 ©flnfl 1 — 

fBiEa>£«li<3!frS(D*'f h/HH8»aj*3fc. 

uraani] s«b«*b«u *©e*t»LT 
iwbji«tot*- , 7- K*ttar*n i oiet. 
we»ifl!)ia"«*aj*hfc*-7-Ki. n«qi2« 
l">li o<^n#— oiciBtt©:*»ii«#s©*^h/u 

%2<DXmt, ttB*2«)III?#fE*tlfc*<f'l*/l>* 
[0001] 

y -S* A* * *-*©B»A2j£«fr 6 A7J £ ftfc£«ij 
**** hiMWiLTl*iar**»B«bbi k 6©*-f h 
[0002] 

tt«eoi!lflEtt*H*fci6(c. 85«A7JSHfrS0£* 
TWyfflLTWc. 

[0 0 0 3] Sfc, #J©|««fc:»LTU'f7 , *HW 

tU#, Ml¥9 - 1 3 4 4 0 6^jm<0 

HW4iiJSHfcJ:tf£;ijl . »H¥5 - 2 7 4 4 

7 1 iy y— ?**®*-r h/uwmaiwBA 

[0004] 



MHtflfttLJ: 3 LfrLfctfS, ±Etc 

5** hum**-?- Ki«B©«inn*s««tf*< 

[0 0 0 5] Sf3R<fcy Wii3tlT^«MHt 9-134 
4 0 6#Si?g • MBIT 5 - 2 7 4 4 7 1 #ii«££oT 

[0 0 0 6] ±I3lctB*Tft*tlfc i tOT*» 

mmnts «fctf*«ai!Wfi!»ij«tt*iei±* **c 

[0007] 

ME**9J*«©Bttte*"3^T*'f HUS L*©** 
> HW**ff U *-f h/USttttif **»H«6^SO 
h/HHSttHMlcS^T. HulB^iffigiJ^STHJJ 
y JB * *i/cft*SMSBfc» L, ^X*5iJffifBrt ©3t*l8 

«*fi3**B*#afc. BuiB^MEsy#S7-wyai* 

'J • TK8 • *»*fflt^T*'T h;us 

[0 0 0 8] «|«3|i2lcffiS3»ii«6^SO^-f 

hyi/*i*air«3tSii®A^s©*-f himstoihismc 
twwr** 1 oiat. Butssg 1 ©i^t— s© l$ 

[0009] sfc. w^gi 3 ic«53»n«^e co^ -r 
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fcl^T, MIB£^5US»SErt©:£?§8f$:&£fiU ^fc^ 
*§©*< HU©**R«Bi.\ MfB£?a£it8U 
MIBS&2©!*!?, &¥B£tt#B£BA?*« 

i§£k, nusL*«><K-fvi**iiiwu *©s§t 
fiiic j: y Hi/Btt*9t£ • ttttjr*sg3 ©ia<h. 

[0 0 10] $fc, B*B4ICtt«&*Bftfr54>*'r 
X*5iJfH«*>BBTHB y ffl U MIBX£ SBMOBttK 

h/u*»iar«s»H«ii»6fl!>*-r M/Bbjahuesk: 
LTg^m^aia^iifi-rsm 1 vxmt. mibis 1 © 

IBQ&B, (Wikttfc*oT^*BB?**fr«UB 
r5H2©XS<i:, MBB2©lST*»lhiMC*-3T 
l*«BBfcttL. *-f h/U6L*©#'T>h*ilD»U 
*©S§+<iIfcJ: y Nl/Btt**£ • fcttir£SII3© 

[0011] *fc. B*m5icft«&ftBBfr6«>*<r 

HiMMBMBfrBlcfcoTtt. A**tifc**ffl«tti k S 
**5"]««*««-W!I 'J HJ U WB*¥5Ufi«fl>Btt»c 
&-5^T*<< h;U6 L* 0#-T> hfftft«£ftU 
K;l/*ttffif £J£#il«!fr6©*-< Hi/Btttttti£&fc 
Bi^T. «E*^JB«rt©**3-h^J!>BW&»fc:at 

LTa&«B»s«*?Tr«B 1 ©ist, neat 1 © 
xbobb. hMmtbtzmmommismt 

K»B»?>«WW©**3-K5iJi*tt«U £«tt!8 

*«Br*B2©is£, m&,m2<Dxm<D®mctt 

U *f h/U6LS4>#<f>h*lnftL. *©*tH!lfc 

*y*-rM/«««a«'ttai*-**3®iai:. *s 

t>t>©T*£3<, 
[0 0 12] Sfc. BSB6fcffi«&£BB*e4>*'r 
K/UBBttttBKStefcoTtt. A**ftfeft«B(fcfii»6 
J&BBB «BBTB »J ttl U tOE&$*MBaB£lc 

MU*ttttnr*XBBBfrS©*-f h/l/BBtttiU&Bl:: 
jS^T. IMB»WBBfc»LT7» hiffig'JJaS** 
fir*B 1 ©IB£. MIB7 * y hBMABOBBteB 
r^T. **®7*VKZ*'OU*MaiU fc:£©7* 
> h*B^T^***B«T***»*tJKr*»2©I 
8&, MiBar2©lST'ftj£©7*Vh*ffl^TV££ 
*BB1C»U 4r-rHI/SL*0#'T>h*lnBU * 

o^itfflicfcy^Y hji4m*»£ • ttmr*!g3©i 



[0 0 13] *fc. B*B7(ctte&BBB#60>*'f 
NbBWittBSSfclcfcoTW:. X73*nfc3tSii^6 
**5UBB*B»T« y ttl U «E**W««©Btt»c 
S^T** hJl/6L$4>#'TVhttB*B&U *-f 
hJl/«ttfUr«&BBBfr60>9<f h;l/B«fthttl«&lc 

fii^T. MiBX*9JB«ic»LT7 * v hBgiflss* 
fir *s 1 ©i*i£ . mib7 *>h B&i&aaBBKS 

72*M/©fc7h7"7A£f&$LTfc*. ttSSi«g©'J> 
ftl*7*V h***;M^l*Tl**S»Ba?***i k * 
*iJI(rr5m2©lSi ( BI3B2©lBTWBL/t» 
BBfcttU *-f K;U6L*©*-rvh*llDSU *© 

nfffiitCtfey^i' hJUM«»£ • ttttsrsm3©is 

[0 0 14] $fc, B#B8U:«*:fc»B«frS©*f 
MI/B«»tt]*aH=*oTtt. A**ftfc»»B«<r6 

**5yB*«BBiHn y as u mibx^ siBttaBttic 

W"3U*Z*4 h/U6 L*©<K-f > httfftBffU *-f 
NMHawrfcXBBBfrS©** h/UBBttttl$Blc 
fit^T. «E*WJJ©Brt©***»BB«!>7^^ hit 
**tt*Bl©I«t. MIST 7 7^7 HktcB^Ttt 
«W?**fr«UBr*B2fl!>lSfc. BulBffiftX^ 
?**£«»LfcX*B*fcWU *fMU6L*©# 
<f>l**toBU *©SfffIfcJ:y*-r h/bBttft»& 

• «air5Si3 ©is<t. ^^otoT'S*. 

[0 0 15] $fc, B*fli9[«:«*JtBB«!A^6©^'(' 

np««»tti*aica5-3Ttt. A**nfcs»B«ars 
**5UB«*JBBTHayfflu taejss^JBWositic 

1 ©IBt. K»EX*BBBJl»cJ:oTffia**tt 
»WB«ti^#&¥)B^0!)«M «3»l*l$l*«*fli) ©^ 
IHB*«465m2©ia<t:, ttBftttBtfttlB^B^ 
B«©Hl^»TifeSA^*JBFr*m3©lSi:, MS 
m 3 ©IBTB ir¥»Tife« t W3£**ifcX*WB«[c 
»L. *-fh;U5L*©?K'fVh*nDllU ^©afHB 

tcj:y*-f h/i/«B*a3e-ttiBr*»4oiat. * 

^t?t©T*S'5„ 
[0 0 16] $fc. ttKBI 0»Cffi**»B^5©^ 

©A7DXSJB«lc^to-li-T¥S LT{15tl5fi®fil<!: 

[0 0 17] $fc. B«B1 1 K«*3WBB*»l=* 
oTli, 3tSB«J*ISIiL. ^©!gSlc^LT«i§iQS 

*f5-9T*-9-K*ttiur*«i©3»t. Miami 

©lBT-ttttJ*tlfc*-7- Kt. B«B2ftl>L10 

©<»mfr-oici2ig©xsiiS!6-e©'S"r K/uBBtttti 
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[0018] 

mW<Dmm<DBM] J-XT, *5SE©:fcgiS<8!frS©* 

K£§m&&fco^T;SsttH®£#!B LTlttWri>o 
[0 0 1 9] HI \t. *&H4>£tta£BfcW59'r h 

£5„ EJcfcvr. 1 o 1 1*7795/3 yw*— 7a 

* + ^©Ii{i!A7jgS (HSrST) frSAfcStlfc*: 
LT©ffl«W»JSJ. 1 0 2(ttH«tt8iJS 1 0 1 <DWm 

isaigp. 1 o 3immh9J8m o 1 ©aEsueuucs^ 

T7 * > h«W»1f 3 7*7 H3ES'J#ISi: LT©7 * V 
hMHV. 1 0 4I*&¥BK81 0 2©lga*£HT**#S 

o 5 fcM&fcfrSja^etiTvs-fev* y • t« ■ * 

[oo2o]0i ©mafcjsi/'T. oa* 
iusaa*fft\ fcaaaigji o i ic^^SM^fffisusas^ 
it, fusEftguan o 1 K&zwmvmfites&m 

l\ Sfc^IgSffigP 1 0 2(c£«&¥Bft fcJ;tf7*Vh 

[0021] &¥BM0 1 0 2 ?li&&*««>&*3- 
K • MAO, A*«»oaaHi • *** * -f h/USL 

gp 1 0 3 T-tt§X¥Sc07 * > hflWtf* -T MUS L* 
©<K-fV HWfc LT»Sft*. 
[0022] *fc, &*BttBl0 2lcJ:ya5ft«ft 

^4x*o *6k. egsmsaratcfet^. ^-t hue 

[0 0 2 3] ±56©S*'rvh6L*<D?K-fVh 

idiD*. teR^effl^enzL^-tV'j' y v-ysag • t 

[0 0 2 4] wx.. h 3 ~H8 lEgvr 7 a-**- h£ 
ML, *f89?©-«©$"f h;M4BJQa*SSico^T 

©*«f=«feoT*»©tt*^t>-&»5iHim s*^ 



[0 0 2 5] H3tt, *SS©JBa8fcW5Si©*-rf»/l/ 
©»«s^£©L^Miitt±T*-3fcJi£ic$r-r h;u 

6L*©<K'T>h5JlPm-r?)0ytco^T^LTl^, $ 
r. (HKrt*D #SJ»H«*A2lL 

(s 3 o 1 ) , fiusttBian o i tcfc vscnrntrnm 

S'Jt* (S3 0 2) „ filtNT, ±E**W«*I*D*¥ 
3- KSBttL. KMMeMtttf-tt4> L$ 

lMl«±T»*i«5fi'*WBrf* (S3 0 3) . 
T*. -J60)L*tMMLt?»*t?JRLfcJi*. •S"T h 
/l/eLSGtf-OhftlllJVL. *©£fHillE<fcy*-f h 

/i/flwrae • ttffi-r* (S3 0 4) . 

[0 0 2 6] H4tt, 3m<mmfc&Z?52<D*<f h)l 
SB (BS\«r) *6StMtWlL (S40 1) . 

wmm loud: y&^jamugrs (s 4 o 

2) . «ot. **IBWMi:**W®Brt©£«fc*# 
463 (S4 0 3) „ *LT. gft©*^ h;i/©&¥ft« 
Ht\ ±BXMW:tt«L (S4 0 4) . l&i&BfUft 
B»ttrtT**frSfr**MfT* (S405) . CC 

/HMttSUE • ttfflT* (S4 0 6) o 

[0027] rftfc*. x*wiw«rt©x*3- Km 

f«lc3t^fiJ!SJBf*3©3t$iS*^46. SiMf?S1gSi: LT 
[0 0 2 8] B5U. ^©JBffifcfiSSmS©^^ 

(H5W) ^ssaatt^A^L (S50D . 
wmsmi o 1 \z&vz.mtimik®mt% (s 5 o 

2) . gli^T. X*MtMlrt©**a- KoBWSiWc 
»LTa«m»Kl*ff^ (S5 0 3) . m£«PH©ffi 

5 (S5 0 4) „ CCT, m£«ffl©ffi«?««&«Br 

-r^t, *<rh/i'5L$«>#'rvh*in!iu *©^tt 

«l(cJ:V4"f h/l4M«9tt<MlUr* (S5 0 5) . 
[0 0 2 9] $fc, ±5$©«BSrafCfcl>T. h/i/ 

team? «iE£©mtflra&ft t 3 - 

KWfctttttu astaut©iis<i:-a-r5t.©*fgM 

[0 0 3 0] H6tt, SI»©}BII8K«Sg4©^<h;l/ 
»Hl*S6«sr7Q-y+-hTa5*. *f. XSA73 
(Hmti"f) #5S«H«*A*JL (S601) , 

«i§2»jspi 0 1 tc*y«^5ijfi«*«3ur* (s 6 0 

2) . SSL N T, 7*VhW9Jffl«*i?^ (S6 0 3) . 
»K07*Vh (7*>h7^"rjU) !£^tffiiaJ-p»« 
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fr«r*Mffr« (S6 0 4) . 0$»J. *¥<D7*> 

SHs£T'£*jb\ fcSi^i. 7*vhx*-r;WJBSt;:£ 
S£ft<D7*VhX*^/l/©fc;U77A£ffos!cLTfc 

(S605) o 

[0031] 07tt, tmommmz'&sto** hu 

SB (E/TvTf) fre>£«I«!£A7JL (S70 1) , 

watam 4 \ o 1 ic<t yx^MfeE^MS'j-rs (s 7 o 

2) , m^X. ttVmBfiO&tt&BOTZ^? \- 
it£*» (S 7 0 3) , htttfM : «S= 2 : 1 

<D-^<0iJ^J.X±*A«)T^5^SJb^l'Jlffr5 (S7 
0 4) o CZT% -£J^4>Md€£ttTlM'Uf, *-T 

h/UlliSi^S-ttaj-r* (S7 0 5) . 
[0 0 3 2] 08 ». HSfKD^lC^meW^-f 
MJ£j££;jVr7P-* + -hT*35*o £f. ££A7J 
SB (H/^ti-T) frS£«HiJ£A2jL (S80 1) , 

AMKBiai o i icct: v&mvmmzmswz (s 8 o 
2) . gg^T. tcfemsm^tm^ (sso3) , 

m wmzmmm) o-£tmz*&t>z (sso 

4) , fLT, *<D-£%mft-Z.¥ l &B®®.<Dlii$¥ft? 
toZfr^ftZPrntZ (S8 0 5) o CcT, £IHiItf 

• »flw* (sso6) . 

[0 0 3 3] tZZT*. ±.ftLtcnm<DBM\Z&^Z& 

[0 0 3 4] *6(t, ±a©ft]<^»5nS-^6«)4<K 
-fVhfcSlc, B2lc5Vr<fc3lc-&M£H*£tatflc 

[0035] c*i*{<m-r*i:, x?=i- kobmbk 

[0 0 3 6] *fc. **i©B»«fc*n*«T***5y 



[0 0 3 7] jaLfc*<HI>«WMi*at 
*«^T«WM**ff3Ci: , bMl^ftT**. E9 

hT*£s, *r. xsii®*saiL (S90D . 

8JllcMLT*IH«*ffoT*- , 7- K«4Hr* 
(SS90 2) „ SSlC. ±Rttlii$ftfc*-7- K 

Hb£*#HBL (S9 0 3) , *<Dffl&*'f bllZmv 

t*w****17T* (S9 0 4) „ ctuc*y. fcpss 

[0038] 

mt^eo^'f h/i/WKtttuitB (mi) ic^n 

T^-f HbSL*fl!>#'r>HfJI**fiU *<fMU* 

ttai-r *mic. ffitcnsu^arQ y &zntcX&mB 

9J^rt<7)SX^S©7 * > hMBJ*fr3 7 * > h 

&wi§8?#T#f£<i:. *«asij#ST«yia*nfe3t*5ij 

#J**U fro** h/l4ttt]4)tt&tt£J:tf&W*JRit 
[0 0 3 9] *fc. *»WU:ff*5WDMfc6( k 60*'f h 
K«BttL. K«MflMMI«3lj'-S© 

trtitcj: y ^-r h;b«m*%c • tttur^fcfe, #-r> 

[0 0 4 0] *«WtC«**«il«a)^0*'f h 

S^*46. «05rY hiWMt*»*«t\ ±B**tt 



V 
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14£|p)±3-tf SCOT'S*, 
[0 0 4 1J $fc, *?|l!B£&a£«Ii®fr5<D*f h 

ttttCftMMtfi *l«W««SW)M«tt*lft±* E 
[0 0 4 2] $7c. *«Wl::«5S:Sil®6^©^<h 

/i/flMttUAS mxms) K&tut. tc&mmton 

^M^rtO^u-K^Ji^it^L. S«aiS<DBJS 

[0 0 4 3] £7c, *ftHHJte&«Bft&64>*<r h 

LT7 * > h WHOMtfr U *©«*K»"3^T. 
£$©7*Vh7*<</l/£«&JU ^07*Vh«-ffl 

stanu znGtmiczv *f hiufl««%e • tttu 

[0 0 4 4] 3:7c, *ftHKff«&«iMfre4>9<r h 
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(54) DEVICE AND METHOD FOR EXTRACTING TITLE AREA FROM DOCUMENT 
IMAGE AND DOCUMENT RETRIEVING METHOD 

(57)Abstract: 

PROBLEM TO BE SOLVED: To improve the accuracy of title extraction and the 
convenience of document retrieval by analyzing the likelihood of a tile which is 
linguistically natural according to a character code obtained as a recognition result 
and giving points to the title likelihood by using centering, underscoring, the size of a 
character rectangle, etc. 

SOLUTION: A character recognition part 102 obtains the character code, accuracy, 
coordinate values of the character rectangle, and size of each character as points as 
title likelihood. A font identification part 103 obtains the font kind of each character 
as points as the title likelihood. The character code is supplied to a natural language 
analyzing routine of a natural language analysis part 104 to give points as natural 
language title likelihood and the points of title likelihood are given to a character 
string area having a word ending matching the word ending appearing frequency in a 
title through a natural language process. Further, total points of title likelihood are 
calculated by using a centering process, an underscoring process, the size of the 
character string rectangle, etc., to identify the title. 
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CLAIMS 
[Claim(s)] 

[Claim 1] A string area from the document image inputted from the picture input 
device In the title field extractor from the document image which has the field 
discernment means started with a rectangle, performs point count of title-likeness 
based on the attribute of said string area, and extracts a title A character recognition 
means to perform character recognition in this character string rectangle to the 
character string rectangle started with said field discernment means, A font 
discernment means to perform font discernment for every alphabetic character in this 
character string rectangle to the character string rectangle started with said field 
discernment means, A natural language analysis means to analyze natural language- 
title-likeness based on the character code which it may be as a result of [ of said 
character recognition means ] recognition, The title field extractor from the document 
image characterized by having a means with the point to perform point attachment of 
title-likeness using the magnitude of centering, an underline, and an alphabetic 
character rectangle etc., to the character string rectangle started with said field 
discernment means. 

[Claim 2] In the title field extract approach from a document image of starting a string 
area with a rectangle from the inputted document image, performing point count of 
title-likeness based on the attribute of said string area, and extracting a title The 1st 
process which recognizes the character code in said string area, and judges whether 
it is more than a threshold with the fixed reliability of character code discernment, 
The title field extract approach from the document image characterized by including 



the 2nd process which adds the point of title-likeness, and determines and extracts a 
title field with the total value at said 1 st process when it is more than a fixed 
threshold. 

[Claim 3] In the title field extract approach from a document image of starting a string 
area with a rectangle from the inputted document image, performing point count of 
title-likeness based on the attribute of said string area, and extracting a title The 1st 
process which performs character recognition in said string area, and asks for the 
number of alphabetic characters in a character string rectangle at the time of this 
character recognition, As compared with said number of alphabetic characters, using 
the number of alphabetic characters of the title of a document at the 2nd process 
which judges whether the number of alphabetic character rectangles is in a 
predetermined value, and said 2nd process The title field extract approach from the 
document image characterized by including the 3rd process which adds the point of 
title-likeness, and determines and extracts a title field with the total value when the 
number of alphabetic character rectangles is in a predetermined value. 
[Claim 4] In the title field extract approach from a document image of starting a string 
area with a rectangle from the inputted document image, performing point count of 
title-likeness based on the attribute of said string area, and extracting a title The 1st 
process which performs natural language processing to the recognition result of the 
character code in said string area, The 2nd process which judges whether it is the 
field which is a substantives stop as a result of said 1st process, The title field 
extract approach from the document image characterized by including the 3rd 
process which adds the point of title-likeness, and determines and extracts a title 
field with the total value to the field which is a substantives stop at said 2nd process. 
[Claim 5] In the title field extract approach from a document image of starting a string 
area with a rectangle from the inputted document image, performing point count of 
title-likeness based on the attribute of said string area, and extracting a title The 1 st 
process which performs natural language processing to the recognition result of the 
character code in said string area, The 2nd process which judges whether it is the 
string area which compares with the character code train in said string area the 
statistical information dictionary of the ending which occurs frequently in a title as a 
result of said 1 st process, and contains the ending and the match of high frequency in 
the ending, The title field extract approach from the document image characterized by 
including the 3rd process which adds the point of title-likeness, and determines and 
extracts a title field with the total value to the field of said 2nd process. 
[Claim 6] In the title field extract approach from a document image of starting a string 
area with a rectangle from the inputted document image, performing point count of 
title-likeness based on the attribute of said string area, and extracting a title The 1st 
process which performs font discernment processing to said string area, The 2nd 
process which judges whether it is the alphabetic character field which distinguishes 
the font style of an alphabetic character and uses the specific font based on the 



result of said font discernment processing, The title field extract approach from the 
document image characterized by including the 3rd process which adds the point of 
title-likeness, and determines and extracts a title field with the total value to the 
alphabetic character field which uses the specific font at said 2nd process. 
[Claim 7] In the title field extract approach from a document image of starting a string 
area with a rectangle from the inputted document image, performing point count of 
title-likeness based on the attribute of said string area, and extracting a title The 1st 
process which performs font discernment processing to said string area, The 2nd 
process which judges whether it is the alphabetic character field which creates the 
histogram of the font style of the whole document and uses the font style with little 
frequency of occurrence based on the result of said font discernment processing at 
the time of font style distinction, The title field extract approach from the document 
image characterized by including the 3rd process which adds the point of title- 
likeness, and determines and extracts a title field with the total value to the 
alphabetic character field judged at said 2nd process. 

[Claim 8] In the title field extract approach from a document image of starting a string 
area with a rectangle from the inputted document image, performing point count of 
title-likeness based on the attribute of said string area, and extracting a title The 1st 
process which asks for the aspect ratio of each alphabetic character rectangle in said 
character string rectangle, The 2nd process which judges whether it is a double width 
character based on said aspect ratio, The title field extract approach from the 
document image characterized by including the 3rd process which adds the point of 
title-likeness, and determines and extracts a title field with the total value to the 
alphabetic character field judged to be said double width character. 
[Claim 9] In the title field extract approach from a document image of starting a string 
area with a rectangle from the inputted document image, performing point count of 
title-likeness based on the attribute of said string area, and extracting a title The 1st 
process which performs character recognition processing to said character string 
rectangle, and the 2nd process which calculates the total value of the breadth (it is a 
dip at the time of columnar writing) of each alphabetic character rectangle recognized 
by said character recognition processing except the null character, The 3rd process 
said total value judges [ of said alphabetic character rectangle field ] it to be whether 
it is one half mostly, The title field extract approach from the document image 
characterized by including the 4th process which adds the point of title-likeness, and 
determines and extracts a title field with the total value to the string area judged that 
is one half mostly at said 3rd process. 

[Claim 10] The reference value used for propriety decision of point addition of said 
title-likeness is the title field extract approach from claim 2 which considers as the 
optimum value learned and acquired according to the input-statement document 
format of a user unit, and is characterized by adjustable and set up thru/or the 
document image of any of 9, or one publication. 



[Claim 11] The 1st process which recognizes a document image, performs language 
processing to the result, and extracts a keyword, The 2nd process which writes 
together the keyword extracted at said 1st process, and claim 2 thru/or any of 10 or 
the title extracted [ one ] based on the title field extract approach from the document 
image of a publication, The document-retrieval approach characterized by including 
the 3rd process which performs a document retrieval using the title written together 
at said 2nd process. 



DETAILED DESCRIPTION 



[Detailed Description of the Invention] 
[0001] 

[Field of the Invention] This invention relates to the document-retrieval approach at 
the title field extractor from a document image and the title field extract approach of 
extracting the field in a document which expresses the contents of a document 
exactly as a title field from the database of document image data inputted from 
picture input devices, such as facsimile and an image scanner, in order to raise the 
convenience of retrieval, and a list. 
[0002] 

[Description of the Prior Art] In case a document image be search conventionally , in 
order to plan convenience at the time of next retrieval , extracted / created the title 
information and keyword information which an operator be handicraft apart from the 
input of the document image from a picture input device , and express the contents 
of the document exactly , and they be added , and the specific location in a document 
( character string ) be started as a title keyword to the fixed form document . 
[0003] Moreover, "the title extractor from a document image and approach" of JP,9- 
134406A and the "title field extract art of an image document" of JP,5-274471,A are 
indicated for the reference technical reference which extracts a title only using the 
layout-description to a non-fixed form document. 
[0004] 

[Problem(s) to be Solved by the Invention] However, since rating also increases as 
the amount of documents increases, addition of the title information by the operator 
or keyword information will make increase-ization of an activity burden invite, if it is in 
a Prior art as shown above. Moreover, since it was aimed only at the fixed form 
document when automatic logging of a specific location was performed, it could not 
use for a non-fixed form document but there was a trouble that convenience was 
missing. 

[0005] If it was in JP,9-134406,A and JP.5-274471.A currently indicated 



conventionally, since the title extract was performed only paying attention to the 
layout-description, the hitting ratio of the title which expresses the contents of a 
document exactly cannot necessarily be satisfied, and there were troubles, such as 
causing trouble to a next document retrieval etc. 

[0006] By being made in view of the above and using the description of a title proper 
as the point, without being dependent on a specific document format, this invention 
makes a title a string area with many point sizes, carries out automatic extracting, 
and aims at raising the exact nature of a title extract, and the convenience at the 
time of a document retrieval. 
[0007] 

[Means for Solving the Problem] If it is in a title field extractor from the document 
image concerning claim 1 in order to attain the above-mentioned purpose A string 
area from the document image inputted from the picture input device In the title field 
extractor from the document image which has the field discernment means started 
with a rectangle, performs point count of title-likeness based on the attribute of said 
string area, and extracts a title A character recognition means to perform character 
recognition in this character string rectangle to the character string rectangle started 
with said field discernment means, A font discernment means to perform font 
discernment for every alphabetic character in this character string rectangle to the 
character string rectangle started with said field discernment means, A natural 
language analysis means to analyze natural language-title-likeness based on the 
character code which it may be as a result of [ of said character recognition means ] 
recognition, It has a means with the point to perform point attachment of title- 
likeness using the magnitude of centering, an underline, and an alphabetic character 
rectangle etc., to the character string rectangle started with said field discernment 
means. 

[0008] Moreover, if it is in the title field extract approach from the document image 
concerning claim 2 In the title field extract approach from a document image of 
starting a string area with a rectangle from the inputted document image, performing 
point count of title-likeness based on the attribute of said string area, and extracting 
a title The 1st process which recognizes the character code in said string area, and 
judges whether it is more than a threshold with the fixed reliability of character code 
discernment, When it is more than a fixed threshold at said 1st process, the point of 
title-likeness is added and the 2nd process which determines and extracts a title field 
with the total value is included. 

[0009] Moreover, if it is in the title field extract approach from the document image 
concerning claim 3 In the title field extract approach from a document image of 
starting a string area with a rectangle from the inputted document image, performing 
point count of title-likeness based on the attribute of said string area, and extracting 
a title The 1st process which performs character recognition in said string area, and 
asks for the number of alphabetic characters in a character string rectangle at the 



time of this character recognition, As compared with said number of alphabetic 
characters, using the number of alphabetic characters of the title of a document at 
the 2nd process which judges whether the number of alphabetic character rectangles 
is in a predetermined value, and said 2nd process When the number of alphabetic 
character rectangles is in a predetermined value, the point of title-likeness is added 
and the 3rd process which determines and extracts a title field with the total value is 
included. 

[0010] Moreover, if it is in the title field extract approach from the document image 
concerning claim 4 In the title field extract approach from a document image of 
starting a string area with a rectangle from the inputted document image, performing 
point count of title-likeness based on the attribute of said string area, and extracting 
a title The 1st process which performs natural language processing to the recognition 
result of the character code in said string area, The 2nd process which judges 
whether it is the field which is a substantives stop, and the 3rd process which adds 
the point of title-likeness, and determines and extracts a title field with the total 
value to the field which is a substantives stop at said 2nd process are included as a 
result of said 1 st process. 

[0011] Moreover, if it is in the title field extract approach from the document image 
concerning claim 5 In the title field extract approach from a document image of 
starting a string area with a rectangle from the inputted document image, performing 
point count of title-likeness based on the attribute of said string area, and extracting 
a title The 1st process which performs natural language processing to the recognition 
result of the character code in said string area, The 2nd process which judges 
whether it is the string area which compares with the character code train in said 
string area the statistical information dictionary of the ending which occurs frequently 
in a title as a result of said 1 st process, and contains the ending and the match of 
high frequency in the ending, To the field of said 2nd process, the point of title- 
likeness is added and the 3rd process which determines and extracts a title field with 
the total value is included. 

[0012] Moreover, if it is in the title field extract approach from the document image 
concerning claim 6 In the title field extract approach from a document image of 
starting a string area with a rectangle from the inputted document image, performing 
point count of title-likeness based on the attribute of said string area, and extracting 
a title The 1st process which performs font discernment processing to said string 
area, The 2nd process which judges whether it is the alphabetic character field which 
distinguishes the font style of an alphabetic character and uses the specific font 
based on the result of said font discernment processing, To the alphabetic character 
field which uses the specific font at said 2nd process, the point of title-likeness is 
added and the 3rd process which determines and extracts a title field with the total 
value is included. 

[0013] Moreover, if it is in the title field extract approach from the document image 



concerning claim 7 In the title field extract approach from a document image of 
starting a string area with a rectangle from the inputted document image, performing 
point count of title-likeness based on the attribute of said string area, and extracting 
a title The 1st process which performs font discernment processing to said string 
area, The 2nd process which judges whether it is the alphabetic character field which 
creates the histogram of the font style of the whole document and uses the font style 
with little frequency of occurrence based on the result of said font discernment 
processing at the time of font style distinction, To the alphabetic character field 
judged at said 2nd process, the point of title-likeness is added and the 3rd process 
which determines and extracts a title field with the total value is included. 
[0014] Moreover, if it is in the title field extract approach from the document image 
concerning claim 8 In the title field extract approach from a document image of 
starting a string area with a rectangle from the inputted document image, performing 
point count of title-likeness based on the attribute of said string area, and extracting 
a title The 1st process which asks for the aspect ratio of each alphabetic character 
rectangle in said character string rectangle, The 2nd process which judges whether it 
is a double width character based on said aspect ratio, and the 3rd process which 
adds the point of title-likeness, and determines and extracts a title field with the total 
value to the alphabetic character field judged to be said double width character are 
included. 

[0015] Moreover, if it is in the title field extract approach from the document image 
concerning claim 9 In the title field extract approach from a document image of 
starting a string area with a rectangle from the inputted document image, performing 
point count of title-likeness based on the attribute of said string area, and extracting 
a title The 1st process which performs character recognition processing to said 
character string rectangle, and the 2nd process which calculates the total value of 
the breadth (it is a dip at the time of columnar writing) of each alphabetic character 
rectangle recognized by said character recognition processing except the null 
character, Said total value includes the 3rd process which judges [ of said alphabetic 
character rectangle field ] whether it is one half mostly, and the 4th process which 
adds the point of title-likeness, and determines and extracts a title field with the total 
value to the string area judged that is one half mostly at said 3rd process. 
[0016] Moreover, if it is in the title field extract approach from the document image 
concerning claim 10, the reference value used for propriety decision of point addition 
of said title-likeness is made into the optimum value learned and acquired according 
to the input-statement document format of a user unit, and are adjustable and a thing 
set up. 

[0017] Moreover, if it is in the document-retrieval approach concerning claim 1 1 The 
1st process which recognizes a document image, performs language processing to the 
result, and extracts a keyword, The 2nd process which writes together the keyword 
extracted at said 1st process, and claim 2 thru/or any of 10 or the title extracted 



[ one ] based on the title field extract approach from the document image of a 
publication, The 3rd process which performs a document retrieval using the title 
written together at said 2nd process is included. 
[0018] 

[Embodiment of the Invention] Hereafter, the document-retrieval approach is 
explained to the title field extractor from the document image of this invention and 
the title field extract approach, and a list with reference to an accompanying drawing. 
[0019] Drawing 1 is the block diagram showing the system configuration which 
performs title field extract processing concerning the gestalt of operation of this 
invention. In drawing, 101 as a field discernment means which starts a string area with 
a rectangle from the document image inputted from picture input devices (not shown), 
such as facsimile and an image scanner The ****** discernment section and 102 as a 
character recognition means to perform character recognition based on the 
discernment result of the field discernment section 101 ************ and 103 as a 
font discernment means to perform font discernment based on the discernment result 
of the field discernment section 101 The ** font discernment section and 104 as a 
natural language analysis means to analyze natural language-title-likeness based on 
the character code which it may be as a result of [ of the character recognition 
section 102 ] recognition The ********** analysis section and 105 are the sections 
with the point as a means with the point which perform point attachment of title- 
likeness using the magnitude of the centering, underline, and alphabetic character 
rectangle used from the former etc. 

[0020] In the configuration of drawing 1 , if a document image is inputted from a 
picture input device (not shown), skew correction etc. will be pretreated, field 
discernment processing will be performed by the field discernment section 101, and 
the information on the coordinate value and magnitude of a character string rectangle 
will be acquired. Subsequently, character recognition by the character recognition 
section 102 and font discernment by the font discernment section 103 are performed 
using the result of the field discernment processing by the field discernment section 
101. 

[0021] In the character recognition section 102, the coordinate value and magnitude 
of the character code and reliability for every alphabetic character, and an alphabetic 
character rectangle are obtained as point attachment of title-likeness. Moreover, in 
the font discernment section 103, the font classification for every alphabetic 
character is obtained as point attachment of title-likeness. 

[0022] Moreover, the character code obtained by the character recognition section 
102 is supplied also to a natural language analysis section 104 natural-language 
analyzer, and gives the point of natural language-title-likeness, i.e., title-likeness of 
the field which is a substantives stop. Furthermore, in natural language processing, 
the statistical information dictionary of the ending which occurs frequently in a title is 
compared with the character code train in an alphabetic character field, and the point 



of title-likeness is given to the string area which contains the ending and the match 
of high frequency in the ending. 

[0023] Moreover, the sum total point of title-likeness is calculated by using the 
magnitude of the centering processing, the underline processing, and the character 
string rectangle used from the former etc. in addition to the point of each above- 
mentioned point-likeness, and a title is identified. 

[0024] Next, with reference to the flow chart shown in drawing 3 - drawing 8 , a 
series of title extract arts of this invention are explained in order. In addition, by the 
configuration of drawing 1 , the method of this title extract processing is performed 
on two or more combination, independent, or selection targets, and can do things. 
[0025] Drawing 3 is a flow chart which shows the 1 st title extract approach 
concerning the gestalt of operation, and when the reliability of character code 
discernment is more than a fixed threshold, it shows the example adding the point of 
title-likeness. First, a document image is inputted from a document input unit (not 
shown) (S301), and a string area is identified by the field discernment section 101 
(S302). Then, the character code in the above-mentioned string area is recognized, 
and it judges whether it is more than a threshold with the fixed reliability of character 
code discernment (S303). Here, when it is judged that it is more than a fixed threshold, 
the point of title-likeness is added, and the total value determines and extracts a title 
field (S304). 

[0026] Drawing 4 is a flow chart which shows the 2nd title extract approach 
concerning the gestalt of operation. First, a document image is inputted from a 
document input unit (not shown) (S401), and a string area is identified by the field 
discernment section 101 (S402). Then, it asks for the number of alphabetic characters 
in a character string rectangle at the time of character recognition (S403). And as 
compared with the above-mentioned number of alphabetic characters (S404), it 
judges whether the number of alphabetic character rectangles is in a predetermined 
value using the number of alphabetic characters of the title of a document (S405). 
Here, if it judges that the number of alphabetic character rectangles is in a 
predetermined value, the point of title-likeness will be added, and the total value will 
determine and extract a title field (S406). 

[0027] That is, at the time of the character code recognition in a string area, it asks 
for the number of alphabetic characters in a character string rectangle, and compares 
separately, using statistics of the number of alphabetic characters of the title of a 
document as dictionary information, and the point of title-likeness is given to the 
character string rectangle of the number of alphabetic characters appropriate for a 
title. 

[0028] Drawing 5 is a flow chart which shows the 3rd title extract approach 
concerning the gestalt of operation. First, a document image is inputted from a 
document input unit (not shown) (S501), and a string area is identified by the field 
discernment section 101 (S502). Then, natural language processing is performed to 



the recognition result of the character code in a string area (S503), and it judges 
whether it is the field of a predetermined matter, for example, the field which is a 
substantives stop, (S504). Here, if it judges that it is the field of a predetermined 
matter, the point of title-likeness will be added, and the total value will determine and 
extract a title field (S505). 

[0029] Moreover, in above-mentioned language processing, the statistical information 
dictionary of the ending which occurs frequently in a title is compared with the 
character code train in an alphabetic character field, and the point of title-likeness 
may be given to the string area which contains the ending and the match of high 
frequency in the ending. 

[0030] Drawing 6 is a flow chart which shows the 4th title extract approach 
concerning the gestalt of operation. First, a document image is inputted from a 
document input unit (not shown) (S601), and a string area is identified by the field 
discernment section 101 (S602). Then, font discernment processing is performed 

(5603) and it judges whether it is a field containing a predetermined font (font style) 

(5604) . That is, the font style of an alphabetic character is distinguished, the 
histogram of the font style of the whole document is created at the time of whether it 
is the alphabetic character field which uses the specific font, and font style 
distinction, and it judges whether it is the alphabetic character field which uses the 
font style with little frequency of occurrence. And when it is judged that they are 
these fields, the point of title-likeness is added, and the total value determines and 
extracts a title field (S605). 

[0031] Drawing 7 is a flow chart which shows the 5th title extract approach 
concerning the gestalt of operation. First, a document image is inputted from a 
document input unit (not shown) (S701), and a string area is identified by the field 
discernment section 101 (S702). Then, it asks for the aspect ratio of each alphabetic 
character rectangle in a character string rectangle (S703), and the alphabetic 
character rectangle from which an aspect ratio serves as a value near 
horizontahlength =2:1 judges whether it has accounted more than for the fixed rate in 
the number of alphabetic character rectangles in a character string rectangle (S704). 
Here, if it has accounted for the rate more than fixed, the point of title-likeness will 
be added, and the total value will determine and extract a title field (S705). 
[0032] Drawing 8 is a flow chart which shows the 6th title extract approach 
concerning the gestalt of operation. First, a document image is inputted from a 
document input unit (not shown) (S801), and a string area is identified by the field 
discernment section 101 (S802). Then, character recognition processing is performed 
(S803) and the total value of the breadth (it is a dip at the time of columnar writing) 
of each alphabetic character rectangle recognized by character recognition 
processing except the null character is calculated (S804). And the total value judges 
[ of an alphabetic character rectangle field ] whether it is one half mostly (S805). 
Here, if an alphabetic character rectangle field is one half mostly, total value will add 



the point of title-likeness, and will determine and extract a title field with the total 
value (S806). 

[0033] By the way, the document which each user inputs may make it correspond 
without fixing the threshold which is needed in the gestalt of operation mentioned 
above, it may learn, the optimal threshold may be calculated in adjustable from the 
document format which each user uses, and you may change and set up from initial 
value. 

[0034] Furthermore, it is also possible like **** to perform the judgment of a double 
width character or an equal space, and to give the point of title-likeness to them with 
a secondary combination, based on the temporary point called for, as shown in 
drawing 2 . 

[0035] If this is added, using each alphabetic character rectangle area size in the 
character string rectangle obtained at the time of recognition of a character code, by 
computing the aspect ratio of an alphabetic character rectangle field, a double width 
character will be judged and the point of title-likeness will be given to the string area 
which uses this double width character. 

[0036] Moreover, the character density in a rectangle is computed using the 
magnitude of an alphabetic character rectangle field and the string area where it 
belongs, and the number of alphabetic characters in the character string rectangle 
obtained at the time of recognition of a character code, and the value performs an 
equal space judging. And the point of title-likeness is given to the string area judged 
that the equal space was carried out. 

[0037] By the way, it is also realizable to perform information retrieval using the title 
field extract approach mentioned above. Drawing 9 is a flow chart which shows the 
information retrieval approach concerning the gestalt of operation. First, a document 
image is recognized (S901), language processing is performed to the result, and a 
keyword is extracted (SS902). Furthermore, the keyword by which the extract was 
carried out [ above-mentioned ], and the title extracted by the above-mentioned title 
field extract approach are written together (S903), and a document retrieval is 
performed using the writing-together title (S904). Thereby, the convenience at the 
time of retrieval improves. 
[0038] 

[Effect of the Invention] As explained above, according to the title field extractor 
(claim 1) from the document image concerning this invention A string area from the 
inputted document image The field discernment means started with a rectangle A 
character recognition means to perform character recognition in this character string 
rectangle to the character string rectangle started with the field discernment means 
in case it has, point count of title-likeness is performed based on the attribute of the 
string area and a title is extracted, A font discernment means to perform font 
discernment for every alphabetic character in this character string rectangle to the 
character string rectangle started with the field discernment means, A natural 



language analysis means to analyze natural language-title-likeness based on the 
character code which it may be as a result of [ of a character recognition means ] 
recognition, The description of a title proper is considered as point attachment, 
without establishing a means with the point to perform point attachment of title- 
likeness using the magnitude of centering, an underline, and an alphabetic character 
rectangle etc., to the character string rectangle started with the field discernment 
means, and being dependent on a specific document format. Since it uses, the 
equipment which it is realized that it carries out automatic extracting, using a string 
area with many point sizes as a title, and raises the exact nature of a title extract and 
the convenience at the time of a document retrieval can be offered. 
[0039] Moreover, according to the title field extract approach (claim 2) from the 
document image concerning this invention The character code in a string area is 
recognized and it judges whether it is more than a threshold with the fixed reliability 
of character code discernment, and the point of title-likeness is added when it is 
more than a fixed threshold. With the total value Since a title field is determined and 
extracted, it can be realized that it carries out automatic extracting, using a string 
area with many point sizes as a title, and the exact nature of a title extract and the 
convenience at the time of a document retrieval can be raised. 
[0040] Moreover, according to the title field extract approach (claim 3) from the 
document image concerning this invention The character recognition in a string area 
It performs and asks for the number of alphabetic characters in a character string 
rectangle at the time of this character recognition, as compared with the above- 
mentioned number of alphabetic characters, it judges whether the number of 
alphabetic character rectangles is in a predetermined value using the number of 
alphabetic characters of the title of a document, and when the number of alphabetic 
character rectangles is in a predetermined value, the point of title-likeness is added 
to the corresponding string area. With the total value Since a title field is determined 
and extracted, it can be realized that it carries out automatic extracting, using a 
string area with many point sizes as a title, and the exact nature of a title extract and 
the convenience at the time of a document retrieval can be raised. 
[0041] Moreover, according to the title field extract approach (claim 4) from the 
document image concerning this invention Natural language processing is performed 
to the recognition result of the character code in a string area, it judges whether it is 
the field which is a substantives stop as a result of ****, and the point of title- 
likeness is added to the field which is a substantives stop. With the total value Since 
a title field is determined and extracted, it can be realized that it carries out 
automatic extracting, using a string area with many point sizes as a title, and the 
exact nature of a title extract and the convenience at the time of a document 
retrieval can be raised. 

[0042] Moreover, according to the title field extract approach (claim 5) from the 
document image concerning this invention As opposed to the recognition result of the 



character code in a string area Natural language processing The statistical 
information dictionary of the ending which performs, consequently occurs frequently 
in a title is compared with the character code train in a string area, it judges whether 
it is the string area which contains the ending and the match of high frequency in the 
ending, and the point of title-likeness is added to the field. With the total value Since 
a title field is determined and extracted, it can be realized that it carries out 
automatic extracting, using a string area with many point sizes as a title, and the 
exact nature of a title extract and the convenience at the time of a document 
retrieval can be raised. 

[0043] Moreover, according to the title field extract approach (claim 6) from the 
document image concerning this invention Font discernment processing is performed 
to a string area, the font style of an alphabetic character is distinguished based on 
the result, it judges whether it is the alphabetic character field which uses the 
specific font, and the point of title-likeness is added to the alphabetic character field 
which uses the specific font. With the total value Since a title field is determined and 
extracted, it can be realized that it carries out automatic extracting, using a string 
area with many point sizes as a title, and the exact nature of a title extract and the 
convenience at the time of a document retrieval can be raised. 
[0044] Moreover, according to the title field extract approach (claim 7) from the 
document image concerning this invention As opposed to a string area Font 
discernment processing As opposed to the alphabetic character field which performs, 
creates the histogram of the font style of the whole document based on the result at 
the time of font style distinction, judged whether it was the alphabetic character field 
which uses the font style with little frequency of occurrence, and was this judged the 
point of title-likeness Since it adds and the total value determines and extracts a title 
field, it can be realized that it carries out automatic extracting, using a string area 
with many point sizes as a title, and the exact nature of a title extract and the 
convenience at the time of a document retrieval can be raised. 
[0045] Moreover, according to the title field extract approach (claim 8) from the 
document image concerning this invention The point of title-likeness is added to the 
alphabetic character field which asked for the aspect ratio of each alphabetic 
character rectangle in a character string rectangle, judged whether it was a double 
width character based on the aspect ratio, and was judged to be a double width 
character. With the total value Since a title field is determined and extracted, it can 
be realized that it carries out automatic extracting, using a string area with many 
point sizes as a title, and the exact nature of a title extract and the convenience at 
the time of a document retrieval can be raised. 

[0046] Moreover, according to the title field extract approach (claim 9) from the 
document image concerning this invention Character recognition processing is 
performed to a character string rectangle. By character recognition processing The 
total value of the breadth (it is a dip at the time of columnar writing) of each 



alphabetic character rectangle recognized except the null character is calculated, the 
total value judges [ of an alphabetic character rectangle field ] whether it is one half 
mostly, and the point of title-likeness is added to the string area judged that is one 
half mostly. With the total value Since a title field is determined and extracted, it can 
be realized that it carries out automatic extracting, using a string area with many 
point sizes as a title, and the exact nature of a title extract and the convenience at 
the time of a document retrieval can be raised. 

[0047] Moreover, according to the title field extract approach (claim 10) from the 
paintings-and-calligraphic-works image concerning this invention, in the title field 
extract approach from claim 2 thru/or the document image of any of 9, or one 
publication, automatic extracting of a more exact title is realized adjustable and by 
setting up using the optimum value which learns the reference value used for 
propriety decision of point addition of title-likeness according to the input-statement 
document format of a user unit, and is acquired. 

[0048] Moreover, the keyword which was extracted by carrying out character 
recognition of the document image, and performing language processing to the result 
according to the document-retrieval approach (claim 1 1) concerning this invention, In 
order to write together claim 2 thru/or any of 9, or the title extracted [ one ] based 
on the title field extract approach from the document image of a publication and to 
perform a document retrieval using the this written-together title, i.e., a more exact 
title, the convenience at the time of a document retrieval improves. 



DESCRIPTION OF DRAWINGS 



[Brief Description of the Drawings] 

[Drawing 1] It is the block diagram showing the system configuration which performs 
title field extract processing concerning the gestalt of operation of this invention. 
[Drawing 2] the point of title-likeness used for the title field extract processing 
concerning the gestalt of operation of this invention — it is the block diagram 
showing what called for secondarily inside. 

[Drawing 3] It is the flow chart which shows the 1st title extract approach concerning 
the gestalt of operation of this invention. 

[Drawing 4] It is the flow chart which shows the 2nd title extract approach concerning 
the gestalt of operation of this invention. 

[Drawing 5] It is the flow chart which shows the 3rd title extract approach concerning 
the gestalt of operation of this invention. 

[Drawing 6] It is the flow chart which shows the 4th title extract approach concerning 
the gestalt of operation of this invention. 



[Drawing 7] It is the flow chart which shows the 5th title extract approach concerning 
the gestalt of operation of this invention. 

[Drawing 8] It is the flow chart which shows the 6th title extract approach concerning 
the gestalt of operation of this invention. 

[Drawing 9] It is the flow chart which shows the information retrieval approach 
concerning the gestalt of operation of this invention. 
[Description of Notations] 

101 Field Discernment Section 

102 Character Recognition Section 

103 Font Discernment Section 

104 Natural Language Analysis Section 

105 Section with Point 



